KMID : 0644020190320030047
|
|
Journal Of Korean Medical Classics 2019 Volume.32 No. 3 p.47 ~ p.57
|
|
Comparison of Word Extraction Methods Based on Unsupervised Learning for Analyzing East Asian Traditional Medicine Texts
|
|
Oh Jun-Ho
|
|
Abstract
|
|
|
Objectives : We aim to assist in choosing an appropriate method for word extraction when analyzing East Asian Traditional Medical texts based on unsupervised learning.
Methods : In order to assign ranks to substrings, we conducted a test using one method(BE:Branching Entropy) for exterior boundary value, three methods(CS:cohesion score, TS:t-score, SL:simple-ll) for interior boundary value, and six methods(BExSL, BExTS, BExCS, CSxTS, CSxSL, TSxSL) from combining them.
Results : When Miss Rate(MR) was used as the criterion, the error was minimal when the TS and SL were used together, while the error was maximum when CS was used alone. When number of segmented texts was applied as weight value, the results were the best in the case of SL, and the worst in the case of BE alone.
Conclusions : Unsupervised-Learning-Based Word Extraction is a method that can be used to analyze texts without a prepared set of vocabulary data. When using this method, SL or the combination of SL and TS could be considered primarily.
|
|
KEYWORD
|
|
Text segmentation, Word extraction, tokenization, East Asian Traditional Medicine, Korean medicine
|
|
FullTexts / Linksout information
|
|
|
|
Listed journal information
|
|
|
|